0 00 20 12 v 1 1 7 Fe b 20 00 On The Closest String and Substring Problems ∗

نویسندگان

  • Bin Ma
  • Lusheng Wang
چکیده

The problem of finding a center string that is ‘close’ to every given string arises and has many applications in computational molecular biology and coding theory. This problem has two versions: the Closest String problem and the Closest Substring problem. Assume that we are given a set of strings S = {s1, s2, . . . , sn} of strings, say, each of length m. The Closest String problem [1, 2, 4, 5, 11] asks for the smallest d and a string s of length m which is within Hamming distance d to each si ∈ S. This problem comes from coding theory when we are looking for a code not too far away from a given set of codes [4]. The problem is NP-hard [4, 11]. Berman et al [2] give a polynomial time algorithm for constant d. For super-logarithmic d, Ben-Dor et al [1] give an efficient approximation algorithm using linear program relaxation technique. The best polynomial time approximation has ratio 4 3 for all d, given by [11] and [5]. The Closest Substring problem looks for a string t which is within Hamming distance d away from a substring of each si. This problem only has a 2− 2 2|Σ|+1 approximation algorithm previously [11] and is much more elusive than the Closest String problem, but it has many applications in finding conserved regions, genetic drug target identification, and genetic probes in molecular biology [8, 9, 10, 16, 17, 19, 20, 21, 22, 23, 11]. Whether there are efficient approximation algorithms for both problems are major open questions in this area. We present two polynomial time approxmation algorithms with approximation ratio 1 + ǫ for any small ǫ to settle both questions. ∗Some of the results in this paper have been presented in Proc. 31st ACM Symp. Theory of Computing, May, 1999 [12], and in Proc. 11th Symp. Combinatorial Pattern Matching, June, 2000, [14].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ar X iv : c s . C C / 0 20 50 56 v 1 2 1 M ay 2 00 2 Parameterized Intractability of Motif Search Problems ∗

We show that Closest Substring, one of the most important problems in the field of biological sequence analysis, is W[1]-hard when parameterized by the number k of input strings (and remains so, even over a binary alphabet). This problem is therefore unlikely to be solvable in time O(f(k) · n) for any function f of k and constant c independent of k. The problem can therefore be expected to be i...

متن کامل

ar X iv : h ep - t h / 00 02 08 7 v 1 1 0 Fe b 20 00 COLO - HEP - 442 hep - th / 0002087 String Universality

If there is a single underlying “theory of everything” which in some limits of its “moduli space” reduces to the five weakly coupled string theories in 10D, and 11D SUGRA, then it is possible that all six of them have some common domain of validity and that they are in the same universality class, in the sense that the 4D low energy physics of the different theories is the same. We call this no...

متن کامل

ar X iv : n uc l - ex / 0 00 20 14 v 1 2 9 Fe b 20 00 1 Background Studies for the Neutral Current Detector Array in the Sudbury Neutrino Observatory

An array of 3 He-filled proportional counters will be used in the Sudbury Neutrino Observatory to measure the neutral-current interaction of neutrinos and deuterium. We describe the backgrounds to this detection method.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006